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Models for detecting the effect of adaptation on population genomic diversity are often predicated on a 
single newly arisen mutation sweeping rapidly to fixation. However, a population can also adapt to a new 
environment by multiple mutations of similar phenotypic effect that arise in parallel, at the same locus or 
different loci. These mutations can each quickly reach intermediate frequency, preventing any single one from 
rapidly sweeping to fixation globally, leading to a "soft" sweep in the population. Here we study various 
models of parallel mutation in a continuous, geographically spread population adapting to a global selection 
pressure. The slow geographic spread of a selected allele due to limited dispersal can allow other selected 
alleles to arise and start to spread elsewhere in the species range. When these different selected alleles meet, 
their spread can slow dramatically, and so initially form a geographic patchwork, a random tessellation, 
which could be mistaken for a signal of local adaptation. This spatial tessellation will dissipate over time 
due to mixing by migration, leaving a set of partial sweeps within the global population. We show that the 
spatial tessellation initially formed by mutational types is closely connected to Poisson process models of 
crystallization, which we extend. We find that the probability of parallel mutation and the spatial scale on 
which parallel mutation occurs is captured by a single compound parameter, a characteristic length, that 
reflects the expected distance a spreading allele travels before it encounters a different spreading allele. This 
characteristic length depends on the mutation rate, the dispersal parameter, the effective local density of 
individuals, and to a much lesser extent the strength of selection. While our knowledge of these parameters 
is poor, we argue that even in widely dispersing species, such parallel geographic sweeps may be surprisingly 
common. Thus, we predict that as more data becomes available, many more examples of intra-species parallel 
adaptation will be uncovered. 

1 Introduction 

There are many dramatic examples of convergent evolution across distan tly related sp ecies, wher e a pheno- 



type independently evolves via parallel changes at orthologous genetic loci (|WoOD et al\.\20Q 5h: A rendt and Reznick , 
2008) , indicating that adaptation can be strongly shaped by pleiotropic constraints ( HaldaneJ . Il932t IOrrI . 
2005, : .Stern and OrgogozoI . 120081: IKoppI . |2009| ) . There are also a growing number of exai nples of the paral- 
lel ev olution of a phenotype within a species due to independent mutations at the same gene ([Arendt and Reznick , 
20081 ) (which are sometimes referred to as genetically redundant). Some of the be st-studied examples come 



from the repeated evolution of resistance to insecticides within several insect species (Iffrench Constant et al. 



2000 ). and the resistance of malaria to antimalarial drugs (JAnderson and RoperI . l2005t IPearce et al. 



20091 ) . Another example is the loss of pigmentation in Dro sophlia santomea through least three independent 
mutations at a c?s-regulatory element ( Jeong et all 2008 ), while the evolution of pigmentation within ver - 
tebrate species provides further examples (JKiNGSLEY et aLl . 120091 : IGross et a/.l . [2009[|PROTAS et al\ . \200^ . 



Ther e are also a number of examples of parallel evolution within our own species (JNOVEMBRE and Dl RiENZol . 



2001 



2009 ) . For example, various G6PD mutations have spread in parallel in response to malaria (jTiSHKOFF et al. 



LouiCHAROEN e^ o/]^ 20091) ■ and lactase-persistence has evolved i ndependently in at least three differ 



ent pastoral populations ( Enattah et all l2008t ITishkoff et all |2007| ) . A particularly impressive example 



in humans is offered by the sickle cell allele at the /3-globin gene that confers malari a resistance, where 
multiple changes have putatively occurred at a single base pair (see IFlint et all Il998l . for discussion) . In 
each of these examples, multiple, independent mutations have lead to the same or functionally equivalent 
adaptive phenotype, although they differ in the degree to which the functional consequences and equivalences 
of the different mutations have been explored. Such repeated adaptive evolution via similar changes within 
a species, which we term parallel adaptation, may therefore be common. As we will also address repeated 
evolution of a similar ph enotype via changes at differe nt genetic loci this could more broadly be termed 
"convergent adaptation" (JArendt and ReznickI . 120081 ). 

In many of these examples the selection pressure is patchy and rates of gene flow are low, increasing 
the chance of parallel adaptation. However, parallel adaptation can occur even in a panmictic popula- 
tion. For example, adaptation may occur from multiple independent copie s of the selected allele present 
in standing variation at mutation -selection balance within the population ( Qrr and BetancourtI . 12001 : 
Hermisson and PenningsI . 12005 ). Even when there is no standing variation for a trait in a panmictic 



population, a selected allele could arise independently several times during the course of a selective sweep, 
if mutation is sufBciently f a st rela tive to the spread of the selected allele. This idea was formalized by 
Pennings and HermissonI (|2006al lbl). who showed that such soft sweeps may be expected when the pop- 
ulation scaled mutation rate (the product of the effective population size and mutation rate) towards the 
adaptive allele is > 1. Thus, repeated mutation may be quite common for species with large populations, 
or where the mutation target is large, e.g. knocking out of a gene. Pennings and Hermisson showed that 
the number of independently arisen selected alleles in a sample has approximately the Ewcns distribution 



([Pennings and HermissonI. l2006af). and prop erties of neutral variation at a closely linked site can be derived 



from this (JPennings and HERMISSONI . l2006bl ). Such a selective sweep has been termed a soft sweep, as the 



population can adapt wi thout the dramatic reduction iii diversity a t linked selected sites that is usual l y asso - 
ciated with a full sweep (IMaynard Smith and HaighI.I1974|). seelHERMissoN and PfaffelhuberI (gOOSi ). 
Pennings and Hermisson ( 2006a ). 'Pennin gs and HermissonI (l2006bl ). and lPRiTCHARP et all (|2010l ) for 
discussion, and lSCHLENKE and Beg un (2005) or jjEONG et all (|2008t) for potential examples. 

Clearly, if parallel mutations can occur during adaptation in a large panmictic population, then lim- 
ited dispersal should further increase the chance of parallel adaptation, as other mutations can arise and 
spread during the time it takes one to move across the species range. Intuitively, a lo w rate of dispersal 
and a large mutational target sho uld increase the chance of parallel adaptation (as in ICOOP et all 120091 : 



NOVEMBRE and Dl RiENZol . 120091 ) . but it is unclear exactly how other dispersal, population and mutational 
parameters play into the probability of parallel adaptation. However, in the absence of a formal model, many 
simple questions remain: Does parallel adaptation only occur in species with strong population structure? 
Weak selection pressures lead to slowly spreading mutations; is parallel adaptation more likely in this case? 
This leaves us una ble to understand the likelihood of parallel adaptation in particular exam ples (such as 



20091) 



Flint et all Il998l ) and more generally its role in geographic patterns of adaptation (such as ICoOP et al. 



Here we study parallel adaptation in a homogeneous, geographically spread population. We focus on the 
case where a population is exposed to a novel selection regime throughout a homogeneous species range, 
and the population is initially entirely devoid of standing variation for the trait, assumptions that favor the 
fixation of only a single new allele in the population. We use simple approximations to derive theoretical 
results for the properties of parallel adaptation in a continuous spatial population with strong migration 
for a range of dispersal distributions (also called dispersal kernels, including fat-tailed examples). We are 
able to describe fairly completely the resulting patterns, and show that they are well captured by a single 
compound parameter combining the rate of mutation and the speed at which the mutation spreads. For 
an introduction to the patterns of genetic diversity that can be e x pected from such geographic structure at 



an mtroduction to tne patterns oi genetic diversity tnat can be e x pected irom sucn geograpnic structure at 
both neutral and selected loci, see ICharlesworth et all (|2003l ). INovembre and Di RienzoI (|2009l ). and 



LenormandI (|2002| ) 



We show that when population sizes are sufficiently large and dispersal distances are small compared to 
the species range, parallel adaptation within a species is likely to be common, and quantify this relationship. 
Furthermore, we describe how separately-arisen mutations will — at least for some time — leave behind a 
spatial pattern reflecting their separate origins. 

The structure of this paper is as follows. In Section[2]we introduce and analyze our model of a continuous 
population, first in the classical context, and then in a more general context that allows for accelerating waves 
(arising from fat-tailed dispersal distributions). In Section [5TT] we present the results of some simulations 
of the continuous process, intended to assess the robustness of our results to deviations from the assump- 
tions. In Sections 13.21 and [331 we present and discuss the theoretical results in a few biologically reasonable 
contexts, providing numerical results to illustrate how the different parameters play into the probability of 
parallel adaptation. In Section |4] we discuss consequences and extensions. Some mathematical arguments 
are postponed until the Appendix. 

1.1 Modeling assumptions 

Here we describe the assumptions behind our model and give some background, before introducing in Section 
[2] the model we analyze. First, we assume each mutation under consideration confers a selective advantage 
such that, upon appearing in the population, it quickly rises locally to some equilibrium frequency. Second, 
there is significant spatial structure, namely, migration is weak enough that the selected trait reaches an 
equilibrium frequency locally before spreading to the entire population. Third, the parallel mutations are 
distinguishable, and confer the same selective benefit. Fourth, these mutations are neutral relative to each 
other, in the sense that in a population at equilibrium frequency (e.g. fixation) for any collection of these 
mutations, the dynamics of their relative proportions occur on a longer time scale than their dynamics in 
the original background (examples are given below). We call this last assumption allelic exclusion^ since it 
implies that areas fixed for one adaptive allele will not be rapidly overtaken by another. 

Under these assumptions, a newly arisen advantageous mutation, if it is initially successful, will spread 
through the population in a more-or-less wavelike manner (more on this later). If another allele conferring 
the same advantage arises in a location the first has not yet reached, then the two waves spread towards each 
other and will at some point collide. What happens when they collide will generally depend on the details of 
their epistatic interaction, or, if they occur at a single site, on their dominance interaction. However, by our 
assumption of allelic exclusion, the dynamics are slower than the spread of the selected alleles. This allows 
us to neglect the slower mixing of types and genetic drift that will happen in this phase, instead focusing on 
the first process by which independently arisen alleles partition the population. 

In Figure [1] we show a cartoon to illustrate our model, and in Figures [S] and [H] we show the results of a 
simulation (described in Section [XT]) . 

Allelic exclusion The allelic exclusion assumption is fundamental to our approach. It will hold, for 
instance, if there is a single advantageous mutation, and we treat each time it arises independently as a 
distinct allele, identifiable by examination of linked neutral variation. It will also hold if mutations at multiple 
sites within a gene are genetically redundant, such as loss of function mutations, and no additional selective 
benefit is conferred by having a mutation at more than one site (though this may be an a pproximation, since 



even l oss of function changes within the same gene may differ in their characteristics, as in lRoSENBLUM et al. 
(l2010l) l. 



Another important consequence of allelic exclusion is that a mutation occurring in a location where the 
advantageous allele already exists in large numbers is unlikely to persist or achieve high frequency — indeed, 
if the interaction is neutral and there already exist in the same location 999 other individuals with the 
selected trait, then a new mutation will contribute on average only .001 of the future population, and has 
high probability of being lost from the population by drift. This fact allows us to ignore all new mutations 
that occur after any selected allele has risen in that location to a nonzero frequency. In particular, the shape 
of the wave front will not be important, only how its leading edge spreads. Below, for convenience we often 



talk about the probability or rate of local fixation, but it follows from this observation that we need only 
require that the allele escapes loss from the population by drift and that some intermediate equilibrium 
frequency is reached, as would occur in the case of overdominance. 



Selection We also assume that the advantageous, derived alleles have a reproductive advantage of (1 + s) 
relative to the ancestral type. In practice, in a diploid model with dominance or epistasis, or in the presence of 
density dependence, we require that both the manner in which a new mutation escapes drift, and the way that 
it subsequently spreads through the population, be well-approximated by the simple haploid (or additive) 
model. Roughly speaking, this holds if the growth and spread of the allele is driven by growth where the 
allele is at very low frequency (and primarily occurring in heterozygotes) . This implies that the probability 
a new mutation escapes drift is well-approximated by 2s divided by the va riance in offspring number (which 
is quite robust to the details of spatial structure ( MaruyamaI . Il97d . Il974i) ) , and that per-capita growth is 
faste st when at low frequency. In the usual formulation of diploid systems ([Aronson and WeinbergerI . 
1978I ). this is satisfied if the fitness advantage of the homozygote is no more than twice the fitness a dvantage 
of the heterozygote. In other cases, e.g. an AUee effect, the behavior can be quite different; see IStokes 
(|l976l) . 
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Figure 1: A cartoon representation of our model of spatial parallel mutation. In the top row, each 
panel represents a 2D species range with time increasing from the left to right panel. In the bottom row, a 
ID species range is represented by the vertical axis and time is the horizontal axis, with more recent times 
closer to the right side of the page. Stars represent a new mutation arising and escaping drift. The three 
colors represent the area occupied by three different alleles. Note that (I) and (II) are not different views of 
the same process, although they are similar. 



1.2 Background on the wave of advance 

We model the spread of a sel ected al l ele by making use of existing work on traveling waves, a link firs t 
established independently by IFisherI (|l937t) and by IKolmogorov, Petrovskii and PiSCUNOvl (|l937( ) . 
We introduce and review the wave of advance literature here, as much of the subsequent development has 
occurred in fields other than population genetics. Suppose that individuals produce a random number of 
offspring with mean r, that offspring disperse a random distance with standard deviation a, and let p(i, x) 
be the expected proportion of mutants at time t and location x. Suppose also that the selection coefficient s 
is small and the advantage is additive, and that the population density p is fairly large. Both papers argued 
that if the dispersal distance is Gaussian, or if a is small (so that the "long-time" dispersal distribution is 
Gaussian), then barring the appearance of new mutations, the time evolution of p is well-described by the 
reaction-diffusion equation now known as the Fisher-KPP equation. 



d_ 
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-p{t,x)), 



(1) 



where d is the dimension of the species range. They furthermore showed in d = 1 that a "wave of advance" oc 
curs as the solution to this equation, and that for initial conditions where the allele is onl y polymorphic within 
a spat ially bounded region, the solution moved asymptotically with speed v = ra\/2s. IKolmogorov et al. 
(|l937l) also covered the more general case in which p{x,t){l — p{x,t)) is replaced by F{p{x,t)) for an ap- 



propriate function F, which gives the density dependent growth rate of the selected type, subject to certain 
conditions. 

For many other choices of dispersal distribution and growth function F the advancing front of a new 
type also approaches a constant wave shape that advances at constant speed through time — a "traveling 
wave" solution, but with a speed not given by the same formula. Then the frequency of individuals of the 
selected type at x at time t can be expressed p{x, t) = h(x — vt), where h{-) gives the shape of the wave and 
V is its speed. These traveling wave solutions have been studied for the Fisher-KPP equation for a range of 
appro priate F { Aronso n and Weinberger. Il975h : the speed can often be found more easily than the wave 
shape (JHadeler and Rothe . f975 ). Radially symmetric solutions also exist, in which the new type travels 
outwards from an initial origin; the behavior of such radially spreading waves depends on initial conditions, 
but will asymptotically move with the same constant speed and fixed wave shape as in one dimension. 

Since the introduction of the Fisher-KPP equation, traveling wave solution s to reaction-diffusion equa- 
tions have been studied in the ecological literat ure as a model of invading spec i es ( SKELLAMLIl951tlKOT et all . 
19961 ). as well as in a range of other fiel ds. See Aro nson a nd WeinbergerI (1973) for some classical theory, 
general discussion and context, or iVolpert et al. (1994) for a more extensive reference. Related models, 
using integrodifference or integrodifferential equations have been used by various a uthors to includ e vari- 
ous important biological factors such as age structure and fluctuating environments (^Neubert et aL.' 2000t 
Neubert and CaswellI . EoOO : Kot and Neubert , 2008) ; see Hastings et a l. (2005_) or Zhao (_200i) for 
a review. Density regulation is often discussed in these models, but important behaviors can usually be 
determined by a linearization, based on how the new type grows when rare. Common to these models is the 
existence of traveling wave solutions, whose forms and speeds are often known only implicitly; most natural 
models of the spread of a new selected type can be translated into one of these frameworks. There is also 
a fruitful connection of these Fisher -KPP models to branchin g rando m walks that is beyond the scope of 
this article; see McKean (1975); Bi GGiNsI (JT979, 1995) and K OT et all (i.2004). A similar model, the contact 
process, has also been widely studied in the probabilistic literature; see lBRAMSON et all (|l989f l. 

The qualitative behavior of the spread of an organism or an allele in a population can depend on the or- 
ganism's disp ersal kernel, defined as the probabi lity density of the distance between mother and child's birth 
locations (see lSHiGESADA and Kawasakj (|l997l ) or lCouSENS et all (|2008h for discussion). Most mathemat- 
ical models of invasions assume that the dispersal kernel has tails bounded by an exponential, and obtain 
a constant wave spee d. In some species this is appropr iate, while in others, rare, long-distance migration 
events are important ( Shigesada and KAWASAKil . ll997r ). In such organisms, dispersal may be better mod- 
eled by a kernel that is not bounded by an exponential (i.e. a "fat-tailed" kernel), although there is generally 



insufficient evidence so far ( COUSENS et all l2008l Ch.5). IMollisonI ( 1972 ) showed that in a certain model, 
if the kernel is fat-tailed the range occupied by the expanding type will be patchy and wiU grow faster 
than linea rly: the spread accel e rates and eventually moves faster than any constant-speed traveling wave. 
Moreover, ILewis and PacalaI ( 200CI ) have established a link between leptokurtic kernels (kernels whose 
kurtosis exceeds that of the standard Gaussian) and patchy invasion dynamics. Leptokurtic but exponen- 
tially bounded kernels can lead to waves that initially accelerate but settle to a constant speed. We shall see 
that the important behavior of the model is not determined by the asymptotic, long-time speed of the wave, 
but rather its behavior at intermediate times. Therefore, kernels that have similar short-time behavior but 
different long-time behavior can give rise to similar dynamics on the scale we are interested in. Consideration 
of other wave behaviors leads to a more general model, which we study in Section 12.31 

The models reviewed above are hap l oid m odels; traveling waves in diploid models have been much less 



studied. IAronson and WeinbergerI (|l975l ) show that in the diploid analogue to Equation ([T]), if the 



difference in selection coefficient is small, then allele frequency dynamics are approximately governed by 



^ . If local populations are in Ha rdy- Weinberg e c 
the existence of traveling waves ( WeinbergerI . 



uilibr i um, then mo re general results apply demonstrating 



1982t IZhaoI . 120091) . If dispersal occurs over a distance 



comparable to the width o f the wave then this will no longer be the case, and while recently developed 
general theory ( ZhaoI . 120091 ) might be applied, the existence and characterization of traveling waves in other 
diploid models is to our knowledge an open question. However, we certainly expect the behavior to be 
wavelike, and since our theory takes wave behavior as an input, we have no qualms about using our model 
to discuss the diploid organisms of Section [ 



2 Methods 



Consider a population with continuous spatial distribution. After the change initiating the novel selection 
regime (which occurs at i = 0), selected mutations can arise at random throughout the species range and, 
if they escape loss due to genetic drift, start to expand radially out from their origination point. Under our 
assumption of mutational exclusion, a new mutation can only arise and spread in areas not already reached 
by another selected allele. 

We first examine the case where the wave travels with constant velocity, and then in Section 12.31 study 
the more general case where the speed of spread of the selected allele is not constant through time, proving 
our results in this more general setting. It turns out that the properties of the final pattern of types can be 
conveniently summarized by a single compound parameter, a characteristic length. In the constant speed 
case, models with different parameters will result in patterns with identical properties when the geographic 
distance is scaled in units of this characteristic length. This is similar to the use of effective population size 
in models of genetic drift, for which populations with very different sizes have identical rates of drift when 
time is scaled by the effective population size. This characteristic length can be defined as the distance 
traveled by an unobstructed spreading wave before it is expected that one other successful mutation would 
have arisen within the area so far enclosed. 

Any such expanding waves of selected alleles in real populations are subject to stochastic fluctuations. 
Initially, we ignore this point, but in Section 12.41 we show that if stochasticity is taken into account, first 
order properties such as the mean number of types only depend on the speed of the mean wave, in a way we 
make explicit. 



2.1 The mutational process 

Imagine a large, haploid population with p individuals per unit area distributed uniformly over some range 
U. The spatial range U may be one- or two-dimensional, but must be connected (i.e. not composed of 
disjoint pieces). We allow the number of offspring produced by each individual to be random, with mean 
r and variance ^^, and suppose that each offspring of a nonmutant carries some beneficial mutation with 
probability fi. Mutants have an additive advantage of s relative to nonmutants and additional mutations 
have no effect; to fix things, suppose that nonmutants reproduce at rate 1 (so time is scaled in units of 



generations), while mutants reproduce at rate 1 + s, and that each individual reproduces independently of 
the others and of the state of the population. We will frequently make use of the fact that the probability 
of local fixation of a single new mutant, since s is assumed to be small, is well approximated by 2s/^^. 

The set of times and locations {ti,Xi) at which new mutants appear is a random set of points in space- 
time. Under our assumptions, it is well approximated by a Poisson point process in [0, oo) x U with constant 
intensity piJ-r, the mutation rate per unit area per generation. (Recall that a Poisson point process with 
constant intensity is a random collection of points with the property that the number of points in any set has 
a Poisson distribution with mean equal to the area of the set multiplied by the intensity, and the numbers of 
points in any two nonintersecting sets are independent of each other.) By the "thinning" property of Poisson 
processes, the points of origin of new mutations that will be successful if no other mutant type has already 
colonized the location is also a Poisson point process in [0, oo) x U with constant intensity A = (p/ir)(2s/^^) 
mutations per unit area per generation. 

2.2 Constant wave speed 

As reviewed earlier, if p is large enough that stochastic effects are small, the selected type will spread as an 
expanding wave with constant wave speed v — ra\f2s. The resulting picture is of waves spreading radially 
out from the site of each successful mutation, until they collide with each other. Rather than formally prove 
the convergence of some discrete model, we will take this natural continuous model as our starting point for 
analysis: successful mutations arise as the Poisson process described in 12.11 and instantly begin to spread 
radially at speed v. 

This natural model of parallel adaptation has sources of new waves (new successful mutations) arising as 
a Poisson process in space and time, and each wave expanding outwards until encounterin g another wave. 



where they come to a halt (see Figure[T]). This model has been studied before — again, first bv lKoLMOGOROV 



( 19371 ) — as a model of crystallization, in which nucleation sites form at random points in time and space 



and initiate the radial growth of new crystals. More generally, the speed of the wave and the intensity of 
the Poisson process may not be constant, in which case the final tessell ation of space determined by th e 
crystals is known as the Kolmogorov- Johnson- Mehl-Avrami tessellation ( Fanfoni and TomellinJ . Il998l) . 



The properties of the se crystal sh apes have been extensively studied in the case of constant-speed waves by 
M0LLErI (|l99l Il995[ l and others (|Bollobas and RiordanI . 120081 : lGlLBER'd . [l962l ). 



To make this model precise, suppose that mutant type i arose at location Xi and time ti. At each later 
time i, let Ai(i) denote the area that type i has spread to by time t that was not reached first by any other 
type j. This is formally defined by 

A,{t) ^{x^V ■ ||a;-a;J-w(t-t,)<min(0, Ilx-Xjil -v{t-tj)) Vj}, (2) 

where ||a; — y|| is the Euclidean distance from x \,o y. The dynamics of the collection of areas Ai^ for origins 
{ti,Xi) distributed as a homogeneous Poisson process, is the (Kolmogorov-)Johnson-Mehl process. 

One objection to this picture is that the waves do not really begin to spread instantly at speed v, they first 
need to reach equilibrium (e.g. fixation) locally, which takes a random time of order log{pa'^) / s generations, 
and only converge to the equilibrium wave speed after the effect of initial conditions dies out. Translating 
the points of a Poisson process by independent random amounts produces another Poisson process, but this 
one will no longer be homogeneous in time. However, if the standard deviation of the time to local fixation is 
small relative to A, this is has little effect. Another objection is that the shape of the wave itself is stochastic, 
especially at the beginning. We will ignore this detail and return to it in Section [2^ The simulations in 
Figure [71 and others for a range of parameters (not shown), show that the waves quickly begin to spread at 
a constant speed, after a random delay with relatively low variance, reassuring us that our assumptions are 
reasonable. 

In principle, any property of the model can now be found through calculations with Poisson processes, 
although for general species ranges U the formulas are complicated. All results in this section can be derived 
more or less explicitly by giving a population genetics interpretation to results in lM0LLERl ( 19921 ): they will 



also follow from our results of Section [^751 where the pr oofs are given, an d which generalize the Johnson-Mehl 
model in a different direction than previously done bv lM0LLERl ( 19921 ). 



It turns out that all properties can be summarized fairly simply, especially in this constant speed case. 
Recall that A = 2spfir/^^ is the spatial density of successful mutations per generation, v — a\f2s is the speed 
of the wave, and let a;(d) be the area of the sphere of radius 1 in d dimensions, so Ct'(l) — 2 and ^(2) — tt. 
We define the characteristic length x, which will be useful in a moment, as 

^ i^L\ "^"'^"' = { ^'- \ "''^"' <., 

^ \\uj{d)) Kpny/Tsuid)) ■ ^ ' 

Fix some habitat shape U va d dimensions {d will be 1 or 2) with diameter (the maximum distance between 
any two points in U) equal to 1, and let U{a) be a habitat with the same shape, scaled by the factor a 
(and hence diameter a). Since the process with parameters (u. A, L/(a)) can be realized by rescaling space 
by 1/a and time by Xa'^ in the process with parameters (u/(a''+^A), 1, t/(l)), the distribution of the final 
configuration of types within U{a) (specifically, {Ai(oo)}i) is a function of the single number 

Xa^+^^jd) ^ pfiV2Md^^d+i ^ (A "'■' (4) 

The quantity (a/x)''^^ is, up to a constant, the expected number of other mutations to arise in an area 
of diameter a in the time it takes the wave to cover that area. Furthermore, x is the distance traveled by 
an unobstructed spreading wave before it is expected that one other successful mutation would have arisen 
within the area enclosed so far. (This last interpretation is the reason for the appearance of u;((i).) 

A critical characteristic of parallel adaptation is the density of unique mutations per unit area. This 
mean density of types v(d) can be calculated exactly, and clearly displays the role of x- We define v{d) as 
the mean number of successful types arising in a region divided by the area of that region, assuming the 
species range is the infinite range W^ to avoid edge effects. (This turns out to be independent of the region 
used.) In two dimensions, we get from equation ([9]) (after the integrals worked out in Appendix IB. ip that 

TT 

where F is the gamma function, and while in one dimension, v(X) — x^^'^^^ ■ In general, the mean number 
of successful types in a d-dimensional region of total area A will be A/x'^^ up to a constant depending only 
on the shape of the region. 

The mean area occupied by a successful mutation in d dimensions is \/v{d). In d — 2 there does not 
seem to be a nice expression for the variance of this area, but numerical computation and simulation indicate 
that the distribution i s not highly ske wed — the areas occupied by different mutations are comparable (see 
e.g. Figures [5] and [6]) . IM0LLErI (J1992I ) also computes many other quantities in this constant-speed setting. 



such as the mean density of interfaces between adjacent types, or the mean number of neighbors of a given 
mutation — we do not consider these here. 

A related quantity is the distance from a sample point to the origin of the mutation that eventually 
covers it, which relates to the total number of others sharing a sampled type. The mean and variance (and 
all other moments) of this can be computed in a similar way to Equation (llOp in Appendix lB.il 

Besides the spatial scale, the other most important datum is the time scale of adaptation. Although 
the final distribution of types only depends on the characteristic length, the speed at which adaptation 
spreads through the population depends differently on the parameters. A convenient summary of this time 
is r, defined as the time before a chosen point is hit by an expanding wave. (For the point x, this is 
T = inf{t > : X e Ui^«(0}-) I^ the infinite d-dimensional range M'', the time r has a distribution such 
that t''+^ is an exponential: 

P{T>0=expf J^^ 

so the expected time after selection begins until a chosen location has been reached by an adaptive mutation 
is 

.d,J,n\ -l/(rf+l) p ( _}_\ ,, , T ^-d/{d+l) 



E[r] = {XvMd)y"'''^"T{^\ (rf+l)-'^/('^+i) (6) 



Note that the crucial quantity here in, say, two dimensions, is 



{Xv^ 



r(2s)2/3(p^a)i/3' 
which depends much more strongly on the selection coefficient s than does the characteristic length. 

2.3 The general case 

The results of the previous section can be applied to any model having waves of advance with a constant 
wave speed, substituting this speed for v. Howe ver, real dispersing populations ofte n show important effects 
of rare, long-distance dispersal events (such as ICOYNE et all . Il982t IClarkI . I1998[) . This implies that the 



dispersal distribution should in some sense be "fat-tailed" , but since data on rare even ts is hard to obtain, 
there is no consensus on what distributions best model real dispersal ( KOT et al\ . \l99w . Since the behavior 



of the wave depends on the shape of the dispersal kernel — waves from fatter-tailed kernels may tend to 
accelerate initially — it is important to explore wave behaviors other than the simple constant-speed case 



accelerate initially — it is important to explore wave benaviors other than the simple constant-speed case 
( KOT et all Il996r i. This is more technically demanding, but we find that the spatial properties can again 



be meaningfully described by a single characteristic length. 

As mentioned before, Mollison showed that if the dispersal kernel is not bounded by a decaying expo- 
nential (which we no w refer to as "fat-tailed"), the range occupied by the expanding type expands at an 



ever-increas i ng sp eed (JMollisonL Il972h . Although the resulting behavior is termed an accelerating wave 



(JKOT et al\ . \l99&} . there is no "traveling wave solution" in the same precise sense as before; thus, we imag- 



ine that the frequency p{x,t) is radially symmetric and decreasing as ||a;|| increases for large enough t and 
\\x\\; and that for some fixed low frequency, e, the distance f{t) at which the wave front first rises above a 
frequency e (i.e. /(t) = sup{||a;|| > : p{x,t) > e}) is reasonably well-behaved. In practice, we assume the 
existence of such an /(t), and refer to this as the leading edge, and proceed from there. Note that it is not 
necessary to know the entire wave shape, only the speed of its leading edge, because of our assumption of 
mutation exclusion. Note also that / depends on e, although in many cases the dependence only introduces 
a constant; see Section [231 for examples. 

This leads to the following more general model. As before, the population inhabits a region U with 
uniform population density p, and successful mutations occur at the points of a rate A = 2sp^r/^'^ Poisson 
point process in [0, oo) x U (time and space). Any mutation that occurs in an unoccupied area begins to 
occupy the area around it, occupying by time t any previously unoccupied points no more than distance 
f{t) away, where / is an increasing function with /(O) = and /(t) — )■ oo as i — >■ oo. The function f{t), the 
wave expansion profile, is the radius of an uninterrupted wave after time i, and a point at distance r from 
the origin of a mutation will be reached after f~^{r) time units. 

However, this model is not quite completely specified: unlike in the constant-speed case, the collision 
of two waves traveling at different speeds presents a delicate situation. Even laying aside questions of how 
the waves interact (does their interface grow more quickly?), the resulting patterns are not described by 
the simple analogue of equation ([2]), essentially because if waves are accelerating, then one mutation may 
surround another. The problem is depicted in Figure [2] In defining the regions Ai associated with each type 
i in the original way, we say that each mutation will occupy by time t any previously unoccupied points no 
more than distance f{t) away. However, as is shown in the last time slice of Figure [21 this requires the path 
leading to the occupation of certain points to pass through regions already occupied by another type. In the 
constant- s peed case, it was always clear which points were reached first by each mutation (for a proof, see 
M0LLErI ( 1992r )). A natural definition of the regions Ai that jibes with the "wave-like" intuition would be 



that at time t, region i expands outwards (perpendicularly to its boundary) at rate f'{t — ti), if it is not 
blocked by another region. This definition would allocate the grey region in Figure [21 to the (blue) mutation 
that arose second. Unfortunately, this model seems significantly more difficult to analyze, due to possible 
interactions between many mutational origins. Both definitions are likely equally good ap proximation s to th e 
true dynamics, which are inherently stochastic, especially in the accelerating wave case ( MollisonI . I1972I ). 



Furthermore, if allowing mutations to interfere with each other only slows the waves down, the original 
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Figure 2: The delicate situation of accelerating waves. A. Three panels showing a 2D species range 
at three time points. B. A ID cross section through the 2D range, with time along the vertical axis. Stars 
represent new mutations, and colors code different types. At time Ti, only the red mutation is present, and 
is expanding slowly. At time T2, the red mutation is expanding quickly, but the blue mutation has just 
appeared and is still expanding slowly. At time Ts, the red mutation has outflanked the blue mutation, but 
it is not clear which mutation should have claimed the grey area. Each point in the grey area would have 
been reached first by the red mutation if unobstructed (trajectory a), but if forced to detour around the blue 
mutation, the shortest path (trajectory b) is long enough to allow the blue mutation to arrive first. 



definition will be more conservative than another that allows interference, in that it results in strictly fewer 
independent mutational origins. 

With this in mind, we stick with the simple analogue of equation ([2]), defining formally for t > U 



A{t) = {xeU: /-i(||x -x,\\)^{t- U) < min (O, r\\\x - Xj\\) - {t - t,)) Vj}. 



(7) 



This preserves the same intuition as before, with the modification that waves may now move through each 
other invisibly to claim an area on the other side of a distinct region, which is unrealistic, but results in an 
underestimate of the number of independent parallel mutations. 

Although this more general model has also been studied in the context of crystallization, where it is 
generally known as "Kolmogorov- Johnson- Mehl-Avrami" dynamics ( Fanfoni and TOMELLINJ . Il998[) . the 
focus is on phase transitions and how the proportion of occupied space increases with time (governed by the 
Avrami equation) . The statistics of the number and shape of the regions in this more general setting seem 
to have so far been unaddressed. 

Any quantit y we migh t be i nter ested in follows in princip l e from a calculation with Poisson point 
see IKingmanI (|l993l) or IDaley and Vere-JonesI (J2003l) for background. Figure [3] depicts 
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the general procedure; it will be useful in viusalizing the following argument. For instance, fix a time t and 
a point xo that is at least distance f(to) from the boundary. The location xq is covered by a mutant type 
if at some time previously, a mutation arose nearby enough that it could have reached that location in the 
time elapsed since it arose. If the mutation arose t time units in the past, it must be within distance f{t). 
Therefore, to see if a point xq has not yet been covered at time tg, we need only look back through time to 
see if, at each time t units in the past, the corresponding circle with radius f{t) is empty of new mutations. 
A trajectory of radially expanding circles moving back through time sweeps out a cone in space-time (whose 
sides are not straight if the speed is not constant); we denote by hit) the area (in space-time) of such a cone 
of height t, defined by h{t) = \{{s,y) : s > and ||a; — y\\ < f{t — s)}\. The area of this cone multiplied by 
the population density represents the total number of individuals a mutation could have arisen in. Since 
successful mutations form a Poisson point process in space and time, we know that the number of successful 
mutations in such a cone is Poisson distributed with mean Xh{t). 
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Figure 3: Cones in space-time. A single ID example shown as a space-time diagram. (I) Mutations 
arise and spread forward in time. (II) Mutations can only spread if a cone backward in time is free from 
other mutations. Mutational origins are marked with stars; A and B are successful. Mutation A arises at 
time s; the probability that it is the first at that point depends on the area of the cone stretching backwards 
from it (shaded); this cone has base /(s) and area h{s). Mutation C does not spread because it occurs where 
B has already reached; we can see this because point B lies in the cone stretching backwards from C. The 
point a:o is first occupied at time r; two quantities of interest are X, the distance from xq to the origin of the 
mutation that eventually encloses it, and R = ./(r). Note that the "cones" will not generally have straight 
sides unless f{t) — vt. 

This simple fact gives us the distribution of the time until some (any) adapted type arrives at a given 
location. Denote by t the time that a;o is first reached by an adaptation, shown in Figure [3] (Formally, 
r = inf{i > : xq G Ui ^i(^)}-) Then r > i if the cone with base at (x, t) is empty of successful mutations, 
and so 

P{t >t} =cxp(-A/i(i)), 

and so if the species range U is infinite (we mean W^), the expected time until xq is adapted is 



{r] 



ex.p{~Xh{t))dt. 



The entire process is parameterized by three things: the wave expansion profile f(t), the mutation 
intensity A, and the region U. However, by changing variables as before, we can remove A and the scaling of U: 
the process characterized by (/(i). A, U) is equivalent up to a linear scaling of time and space, to the process 
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characterized by (/(i/Aa;'^)/x, 1, U/x). {U/x is the region U rescaled by the numeric factor x.) In the constant 
speed case f{t) = vt, then we are left with a single parameter: f{t/\x'^)/x — a:~('^+^)wt/A — {x/xY~^'^uj{d) t, 
where x = iv/{\oj{d))y^^'''~^^^ is the characteristic length of equation ([3]), and recall that u}{d) is the area of 
the sphere with radius 1 in d dimensions. This suggests defining the characteristic length x rnore generally 
to satisfy the equation 

This has a natural interpretation. An unobstructed mutation in t units of time will cover an area with radius 
f{t). The expected number of other mutations that occur in that area over that time period is Xuj{d)tf{tY . 
After the wave has traveled distance x, this expected number is Xuj{d)f~^{x)x'^ — 1- Thus, x is the distance 
traveled by an unobstructed spreading wave before it is expected that one other successful mutation would 
have arisen within the area enclosed so far. 

Regardless of the wave expansion profile, if the range is large relative to X: there will be many mutations 
with high probability; and conversely, a range small relative to x "will have few mutations with high proba- 
bility. Indeed, by the time a single wave has traveled distance x-i in ^-iiy other circle of radius x there is a 
good chance that another mutation has already occurred, so that the chance a single mutation manages to 
fix everywhere before another arises is small. 

2.3.1 Global properties 

Now we derive formulas for a few other quantities of interest: the mean density of successful mutations; 
the expected area covered by a typical mutation; and properties of the distance from a chosen point to its 
mutational origin. We provide these results and their derivations to give more intuition for the Poisson point 
process and space-time cones. M0ller treated only the constant speed case but allowed inhomogeneity in 
time; it seems that the case of a general wave expansion profile / does not appear in the literature. Less 
mathematically inclined readers may at this point skip to Section 12. 5[ or even Section 13. 1[ without much 
loss of continuity. 

We assume that the region U is R'^. In general, the shape of the region will have some effect on these 
quantities, but if the region is large in all directions relative to x then the effect will be small. Also, without 
loss of generality, we can now rescale time so that the mutational fiux A = 1 (which also affects the wave 
expansion profile /). When we compute numerical examples, we'll need to remember that time is in units 
of 1/A generations. 

We will make much use of the volume of the "cone" in space-time swept out by the expanding mutation 
over t time units, defined to be h(t) = L Ld{d)f(u)'^du, and depicted in Figure [31 We will also want 
to know the volume of the cone with radius r at the base, which (abusing notation a bit) we define by 

h{r) = jf'^'-^u{d)f{urdu. 

First consider v, the mean number of successful mutations per unit area. Let g(t, x) be the probability 
that a successful mutation arose at location x and time t, and that no other mutation reached x earlier. The 
mean number of successful mutations originating within a region W is the integral of g{t, x) over [0, oo) x W. 
The probability that some mutation arose in a small region of size e about (i, x) is approximately e; and the 
probability that no other mutation had already reached that point is exp(— /i(t)). Since this does not depend 
on X, the mean number of successful mutations originating in W is equal to the area of W multiplied by v, 
which we now know to be 

/>oo 

exp{-h(t))dt. (9) 

A subregion W of total area \W\ will have on average i^lW] successful mutations arising within it (although 
note that mutations not arising within W could invade) . 

Now consider the area finally occupied by a "typical" successful mutation, which we denote by A. (Tech- 
nically, with distribution given by the Palm measure.) Since the total area of a region divided by the number 
of mutations in the region, which converges to E[A] as the size of the region increases, and this can also be 
seen to converge to l/i^, we also now know that E[^] = 1/u = (/„ exp{—h{t))dt) 
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Finally, fix some point xq. Let X be the distance from xq to the origin of the mutation that eventually 
encloses it, let r be the time until xq is first covered, and let R = /(r) (depicted in Figure [3]). As in the 
constant-speed case, the probability that xq has not yet been reached by time t is exp(— /i(i)), and so the 
distribution of t is given by P{t > t} = exp{—h{t)). Then to have X = x and R — r, we need a mutation to 
have arisen at the appropriate time somewhere in the ring of radius x about xq , and no other mutation to 
have already arisen in the cone whose point is at (r, xq). The cone is empty with probability exp(— ft(r)), and 
since the ring of radius xq and width dx has area d (jj{d)x'^~^dx (and the number of points in it is Poisson 
distributed), the contribution to the joint probability density of X and R is d uj{d)x'^~^ (note that each d 
appearing in this last expression denotes the dimension). This joint density is then 

F{X e dx, R e dr} = cxp (-/i(r)) duj{d)x'^^^dxdr, Q<x<r. (10) 

Integrating over possible values of i?, we get a density for X: 

POO 

P{X e dx} = di^j{d)x'^'^dx / e-'*('''(ir (11) 

J X 

Changing the order of integration, the moments of X can be written 

E[X"] = duj{d) / €-''('') / x'^+'^-^dxdr 

Jo Jo ^^2) 



uj{d)d 
n + d Jq 



oo 



2.4 Stochastic waves 



We have treated the wave expansion as deterministic, but in real biological systems, the true dynamics will 
be stochastic. Established theory for Poisson processes allow us to at least write down analogous expressions 
in the general, stochastic case. For example, if the selection coefficient varies over a sufficiently small range 
that mutational exclusion approximately holds, we could model each type as having a randomly chosen 
speed. Alternatively, the shape itself could be random, with long-distance migrants causing discrete patches 
to appear outside of the main spread of the wave in a stochastic manner. Happily, it turns out that if we 
make the same definition as at the start of Section 12.31 to avoid the "delicate situation," then the equations 
for i^, the distribution of r, and E[A] all still hold, after replacing h{t) by the mean volume swept out over t 
time units, as we prove in Appendix 1X1 

2.5 Some fat-tailed kernels 

Dispersal kernels that lead to accelerating waves are thought to be important for the spread of real organisms 



(JShigesada and KawasakJ . 119971 ). A consequence of the existence of a characteristic length is that the long- 
time behaviour of a wave has little effect — most waves spread no farther than a small multiple of %, and 
hence different / that agree over the scale of x will result in similar patterns. Since even exponentially 
bounded kernels can lead to waves that accelerate for some time before reaching the asymptotically constant 
speed, it is of interest to look at different wave expansion profiles, regardless of what we believe about the 
tail of the dispersal kernel 



or tne aispersai K ernel. 
KOT et al\ ( 19961 1 divide the class of fat-tailed distributions into those distributions with finite moments 



of all orders, and those without. Since our interest is in the resulting patterns of diversity, and it is not 
our intention to analyze the precise behavio r of the wave unde r different models of dispersal, in Appendix [Bj 
we will follow Fourier transform methods of lKOT et all ( 1996[) to somewhat heuristically obtain expressions 



for the wave expansion profile /(•) for two families of fat-tailed distributions without moment generating 
functions: the stretched exponentials (with moments), and the Levy symmetric stable distributions (without 
moments). We then apply the theory of Section [^751 These results are used in Section [5751 to compare the 
three families at some biologically relevant values. 
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All kernels have a scaling parameter, denoted by a^ which is the scale on which distance is measured. 
To then compare different kernels, we need to standardize them to each other; however, we can't match 
standard deviations because for some of these kernels, the standard deviation is not defined. We have chosen 
to standardize so that the interquartile ranges {IQR, the difference between the 75"^ and the 25"^ quantiles, 
equivalent here to the median absolute deviation) match those of the standard Gaussian. See Figure S] for a 
depiction of the resulting kernels. 
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Figure 4: Probability densities for several kernels. Shown are a standard Gaussian, stretched expo- 
nentials with a values i, ^, and ^, and the Cauchy distribution (symmetric stable with a = 1). Each is 
scaled to match the 75 quantile of the standard Gaussian. 



3 Results 

3.1 Simulations 

To test the robustness of the theory to deviations from the assumptions, we implemented simulations in 
R (www.r-project.org). In the simulations, the population is a rectangular grid of demes with A'^ haploid 
individuals in each. Each generation, each individual independently produces either one or no offspring; the 
probability she produces an offspring is r if she is of the ancestral type, and it is r(l -I- s) if she is of any 
mutant type. Each offspring is a new, as-yet-unseen type with probability /i; otherwise it is the same type 
as the parent. Next, all individuals have the chance to migrate, which they do independently of each other 
with probability to to a nearest-neighbor deme. We also used a truncated power-law dispersal kernel, which 
gave similar results, which we do not display. Those that migrate outside the population are lost. Finally, 
the demes are resampled down to size N. As in the Wright-Fisher model, death does not occur explicitly. 
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but only during the resampling phase. The history of a typical run on a one- dimensional grid is shown in 
Figure [5] with time in the vertical direction, and several time slices of a typical two-dimensional simulation 
are shown in Figure [B] 

The simulations use discrete generations and discretized space, in contrast to our theory, and so provides 
better support for the approximations we use. The discrete model is, however, in the domain of attraction of 
the classical wave of advance — if we rescale time so each generation lasts e, rescale space so the grid spacing 
is -y/e, and make selection weak, setting s = s'e, then in the limit as e — )■ and iV — > oo in the appropriate 
manner, we expect the frequency of mutant types to satisfy the Fisher-KPP equation ([!]) with s = s' and 
a2=m2-(''-i). 
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Figure 5: A space-time plot of a single run of a simulation on a linear array of 500 demes each of size 100 
over 20,000 generations. The parameters were s = 0.1, m = .01, ^ — 4 x 10~^, and migration was nearest- 
neighbor. Time runs down the page; different colors label different types, and areas occupied by more than 
one type are colored by a mixture of the colors (local drift is strong in this simulation, so most demes have 
only one type). Each distinct "cone" has a unique type despite similarities in color choice. Note that types 
expand at roughly constant speed until encountering another type, and that mixing, while present, happens 
on a longer timescale. Types that appear where the advantageous type is already fixed (e.g. the orange bit 
between the purple and blue regions on the left) are unlikely to survive, even if they locally escape drift. 



We first used simulations to investigate how well the wave behavior predicted by the Fisher-KPP equation 
approximates the wave speed of this discrete model. Deviations in the wave speed and the wavelike form of the 
spread are likely to be the largest sources of error in our approximations. The results from a representative 
set of simulations are shown in Figure [71 performed on a linear grid of 1000 populations, with s = .1, 
N = 1000, r = .4, nearest-neighbor dispersal, /i = 0, and an initial seed of 20 mutant individuals at the 
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Figure 6: Six time slices of an example simulation in a two dimensional range, showing initial 
establishment and expansion of types, and the beginning of mixing (which happens much slower than ex- 
pansion). The population was composed of a 60 x 60 grid of demes with 1000 individuals in each. Different 
colors correspond to different types, and white is the ancestral type; when more than one color occupies a 
deme, the colors are mixed, so that eventually, if all colors spread to all demes, the entire population will be 
grey. 



origin. For local dispersal the wave speed was constant, after a short transient period of random length, as 
the trajectories on the left of Figure [7] show. The speed found was 1-2 times faster than predicted, likely 
because our simulations have discrete generations (r was typically between -tj and i) and discrete demes, 
rather than the continuous space and time of the Fisher-KPP equation. However, the wave speed does 
depend linearly on m and ^/s as predicted across different dispersal distributions (results not shown), which 
we view as the important verification. 

To test our prediction that the mean radius of the regions occupied by distinct mutations was cap- 
tured by our compound parameter, the characteristic length, we also compared the size of regions occu- 
pied by each type in simulations to the theoretical predictions. We ran 818 simulations on linear grids of 
1000 demes with nearest-neighbor migration at 45 different combinations of the following parameter values 
chosen to obtain a good spread of characteristic lengths: migration probability m e (0.025,0.05,0.1,0.2), 
reproduction rate r G (0.1,0.2, .05), local population size N e (200,600,1000), and selection coefficient 
s G (0.08,0.16,0.24,0.32,0.4). Simulations were stopped at the first time the ancestral type went extinct, 
and the proportions of the total population that were of each type was computed. In a one-dimensional 
range, we expect the mean area occupied by a type (and hence, proportion of the total population, since all 
simulations had the same number of demes) to increase linearly with the characteristic length, with a slope 
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Figure 7: Simulations show constant-speed waves. Typical spread of a wave in a ID simulation, 
showing quick settling down to a constant speed. On the left are plots of the range diameter through time 
in ten independent realizations; on the right are five time slices of a single realization. In each case, the 
mutation rate was set to zero and a mutant was seeded in the center of the population to start the wave. 



depending on the details of the model (which we did not attempt to compute). The resulting distributions 
of proportions are shown as a function of characteristic length in Figure |S] (unshaded boxplots), showing 
a good linear fit of the mean area to the characteristic length (solid line), confirming that our theoretical 
predictions fit the simulated model well. 



Limits of the model In our view one of the more serious approximations we make is that once the allele 
is introduced to a deme it quickly reaches its equilibrium frequency locally. This allows us to assume that 
the time delay between when the selected allele arises and when it begins to spread as a wave of known 
speed is short and relatively constant across alleles. This approximation is important because it underlies 
our assumption of allelic exclusion, i.e. that in areas reached by a spreading mutation, subsequent mutations 
do not arise and escape drift fast enough to also spread. In both cases "fast" must be quicker than the 
time scale on which mutations arise and escape drift locally. To demonstrate the shortcomings of this 
approximation, we ran additional 90 simulations on the same linear grid with a Gaussian dispersal kernel 
whose SD is 1/10 the range size (100 demes), chosen to violate this assumption. The results are shown 
in Figure IS] (shaded boxplots). The local population size N was set to 10,000, to make the characteristic 
lengths comparable, and the other parameters were a subset of those chosen above {m — 0.05, r — 0.1, and 
s G (0.16,0.24,0.32,0.40)). The observed proportion of the range occupied per type is far lower than would 
be predicted from our characteristic length, which is expected if mutati onal exclusion does not hold. Indeed , 



for th ese examples the width of the wave (which is approximately o j \fs (|FisherI . I1937I : IK0LM0G0R0V et al. 



19371 )) is around 1/3 the characteristic length, indicating that a significant number of successful mutations will 
arise in a location where another type has already begun to occupy. In fact, we see in Figure [8] that for these 
si mulations the mean number o f types does not appear to depend on s, as expected in the panmictic model 
of IPennings and HermissonI (|2006al ). If the width of the wave was much smaller than the characteristic 
length, this would not be a problem. We return to the question of when our approximations hold in the 
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Figure 8: Empirical mean range sizes occupied by distinct types depends linearly on the 
characteristic length. The results of 818 simulations at 45 distinct parameter combinations of N, m, r, 
and s and local dispersal (unshaded boxes) as well as 90 simulations at a larger value of N and with dispersal 
on the order of the species range (shaded, grey boxes). All had n fixed at 10~^; see text for parameter choices 
otherwise. For each set of parameter values, between 10 and 30 independent simulations were carried out, 
and the areas (as a proportion of the total range) occupied by each distinct type were tabulated. A boxplot 
of the resulting 45 distributions is shown, with the characteristic length computed using Equation ([3]) on the 
horizontal axis. Since the characteristic length does not depend on r, some boxplots overlap. The boxplots 
are standard (boxes extend from the first to the third quartiles), except that the means are shown as a 
black box, since it is the mean occupied area which is expected to be linearly proportional to x- Shown 
is the regression line of proportion occupied against characteristic length, using only simulations with local 
dispersal. 



3.2 Biological parameters and the characteristic length 

The best summary of the probability and spatial scale of parallel adaptation in our model is the characteristic 
length. The simple form of the characteristic length, especially in the classical Fisher-KPP case, allows us to 
find how the various parameters affect the spatial scale and probability of parallel adaptation. For example, 
doubling the local effective population density has a similar effect to halving the standard deviation of the 
dispersal distance. Intuitively, this reflects the fact that halving the standard deviation of the dispersal kernel 
doubles the time it takes a mutation to spread a given distance, which in terms of the chance of parallel 
adaptation is equivalent to doubling the population density. Note, however, that these two changes are not 
equivalent in terms of the time it takes adaptive alleles to spread throughout the entire range — doubling the 
density will decrease the time until adaptation, while halving dispersal distances will increase the time, as 
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seen in Equation ^. Furthermore, it is intuitively clear — and this can be made rigorous — that in a region 
large relative to the characteristic length, many parallel mutations are very likely. 

With this in mind, we here compute characteristic lengths at some representative parameter ranges and 
in some specific situations, to give a sense of under what conditions parallel adaptation is likely. Consider 
a population spread over an area the size of Europe, here defined as the area west of the Ural mountains, 
which has an area of about lO^km (about a third the area of Africa) and is about 4,000 km across in 
the longest direction. Therefore, ii x < 4000 km, multiple mutations are likely. We take two population 
densities, p = 2 and p = .002, chosen respectively to reflect: the human population density 1000 BCE 
( McEVED and JONEsl . ll978[l : and the long-term ancestral effective population size of humans, N^ — 10000, 



spread out over the area of Europe. We vary the mutation rate between 10~^ and 10~ mutations per 
generation, representing a mutational target of between 1 and 10,000 base pairs (the typical mutation rate 
in humans is ~ 10~^ per base per generation). We keep the selection coefficient s = 0.01 fixed, since the 
results are fairly insensitive to changes in s. We vary the typic al dispersal distance a between one an d 
one hundred kilometers (in line with human dispersal estimates ( WlJSMAN and Cavalli-SforzaI . ll984l )V 
Finally, we compare several different dispersal kernels — rescaled so that each shares a common interquartile 
range — to see how fatness of tai ls affects the results (the stretched exponential, with a — 1/2, matches that 
used bv lNoVEMBRE et al] (|2005l) ). 
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Figure 9: Characteristic lengths under various models at different parameter values. If the species 
range is large relative to the characteristic length, parallel adaptation will occur with high probability. In each 
plot, the per-individual mutation rate is along the horizontal axis. The two plots have the scale parameter 
a of each dispersal kernel set at 1km and 100km, respectively. Line type refers to different dispersal kernels: 
Gaussian (or short-range) dispersal, stretched exponential with a — ^ and |, and Cauchy (stable with a = 1). 
In each case the distributions were normalized to match at the 95th quantile. Color denotes different values 
of p, taken to be two possible values for the appropriate density of the ancestral human population in Europe 
(see text). 

From Figureini we see that the expected degree of parallel adaptation depends strongly on the parameters. 
At the lower population density (black lines), it is not until mutation rate is on the order of hundreds or 
thousands of base pairs (depending on a) that independent origins are likely; at smaller mutation rates 
parallel adaptation is highly unlikely. At the higher population density (red lines), independent origins 
are likely even if the mutational target is only a few base pairs; at larger mutation rates the characteristic 
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length falls to only tens or hundreds of kilometers, indicating a ubiquity of mutational origins. This is to be 
expected, because with p = 2, an area the size Europe would have 10^ individuals, thus a mutation rate of 
10~^ per generation, would produce 1,000 mutant alleles in a single generation. Of course, a characteristic 
length that is on the order of the dispersal distance means that the model described here — a smooth circular 
outward spread of alleles — is no longer a good approximation to the true dynamics. 

It is also clear from Figure IHl that over this range of parameters the choice of dispersal kernel does not 
affect the number of mutations as strongly as do other parameters. The characteristic length does vary 
between kernels, but not generally enough to change the conclusion that parallel adaptation is likely or 
unlikely. At certain parameter combinations the difference is large, but not over much of the biologically 
realistic range we have examined. 
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Figure 10: Expected time to the arrival of an adaptive mutation, in generations, under various 
models at different parameter values. Note the different scales on the vertical axes. In each plot, the 
per-individual mutation rate is along the horizontal axis, and other details are as for Figure [HI 

It is also useful to look at the expected amount of time that the population will take to adapt, which we 
display in Figure [TU] over the same set of parameters. The first observation is that when parallel evolution is 
likely, it will also take a reasonably small amount of time — under the conditions where adaptation will take 
an exceptionally long time, most of the time is spent waiting for a single successful mutation. The different 
dispersal kernels, however, lead to more disparate times to adaptation. Of particular note is the Cauchy 
kernel, which if dispersal distance is small far outpaces the spread by other dispersal distributions. As with 
small characteristic lengths, very small times to adaptation should not be taken too seriously, indicating only 
that the spread happens quickly. 



3.3 Applications 

3.3.1 Why are there so few recent Eurasian-wide sweeps in Humans 

The majority of recen tly arisen selected alleles, as identified from e.g. haplotype patterns, seem to be geo - 
graphically restricted (|VoiGHT et all . l2006t IWang et all . 120061: IPickrell et all 120091: ICOOP et all l2009l ). 
occupying b road geograp hic areas s uch as Europe or East Asia or Western Eurasia. On the basis of this 
observation, ICoOP et all (|2009il and IPickrell et all (|2009l ) argued that few selected alleles have recently 
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swept to fixation across Eurasia. There are at least three possible explanations for this pattern: 1) there 
has not been sufficient time for these alleles to spread; 2) the selection pressures are at a local scale, not 
Eurasia-wide; 3) the selection pressures are shared across Eurasia and different populations have adapted 
in parallel. These explanations are not mutually exclusive, and to distinguish between them we will need 
a much more in-depth knowledge of the phenotypes underlying the putative selective sweeps. However, the 
theory developed here can shed light on whether the last hypothesis is plausible. 

We use a subset of the parameters in the previous section: a Gaussian kernel with a — lOOkni, a 
stretched exponential kernel with a = 1/2, and stable distributions with several values of a. As before, the 
non-Gaussian kernels are parameterized to have the same interquartile range as the Gaussian kernel with the 
same a. Table [1] displays our computed characteristic lengths across the above combinations of parameters, 
at mutation rates of both 10~^ per generation (a single unique base pair change) and 10~^ per generation 
(a 1000 base pair target, roughly the number of coding bases in a gene). The different dispersal kernels give 
roughly similar characteristic lengths, suggesting that these numbers are relatively robust to the choice of 
kernel. 

Multiple mutational origins are likely if the characteristic length is shorter than the physical dimensions 
of the region. Eurasia measures over 8000km across, and so Table [T] suggests that multiple origins at a 
single base pair is very unlikely at the lower population density. On the other hand, if the mutational 
target is large, then multiple origins are likely at low densities, while at high densities independent origins 
are ubiquitous. The complementary cases of (p = 2, ^ = 10^*) and (p = 0.002, ^ = 10^^) give identical 
characteristic lengths of about 3000km, although the time scale on which the mutations spread differs. Thus 
for these two parameter combinations we can expect a few mutations to dominate within continents, and 
for multiple mutations to be common in a population spread across an area the size of Eurasia. Obviously 
these calculations are very crude, as population densities vary through space and time, and dispersal across 
continents is not simply a function of geographic distance and individual dispersal. Nevertheless, these 
calculations suggest that it is plausible that for adaptive traits with reasonable mutational targets (e.g. a 
change anywhere within a gene or pathway) even low population densities can lead to parallel adaptation 
across an area the size of Eurasia, and higher densities almost certainly will. 

We note that as human population densities have increased dramatically over time, so too has the prob- 
ability of parallel adaptation. It is interesting therefore to note that a number of recent human adaptations 
(e.g. sickle cell alleles) involve repeated changes at very small mutational targets in relatively small geographic 
areas, while older adaptations from single changes (e.g. skin pigmentation) are more broadly spread. 
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Table 1: Characteristic lengths, in kilometers, at various parameter combinations and with various dispersal 
distributions. Stable a refers to a stable dispersal distribution with parameter a. The remaining parameters 
were: s = .01, r = 1, ^ = 1, and a = 100 km. 



3.3.2 Parallel adaptation of the sickle cell allele 

The sickle cell allele HbS at the betaglobin gene in humans provides a particularly interesting case of putative 
parallel adaptation. The HbS-allele (/36 Glu— >-Val) has been driven to intermediate frequencies by selection 
within the past 10,000 years due to increased resistance to malari a of heterozygotes for the allele (I HaldaneI . 
19491 : IAllisonL Il954t IKwiatkowskJ . l2005t ICurrat et al\ . l2002[ l . The HbS allele is present on at least four 
major distinct haplotypes in Africa, each at intermediate frequency within a different geographic region; the 
haplotypes are named after the population sample where they were first discovered (Central African Republic, 
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Senegal, Benin and Cameroon). This is consistent with multiple origins of this single-base pair change. Note 
that a distinct, ma laria resistance alle l e, HbC (06 Glu— >-Lys) , has also arisen in Africa at the same codon 



as the HbS allele ()Trabuchet et all . Il99ll: IAgarwal et all l2000t IWOOD et~al\ . l2005al) . in creasing our 



confid ence that the mutational input was high enough to allow multiple types to arise. However. I Flint et al\ 
( 19981 ) thought the hypothesis of multiple new mutations arising at a single base pair was extremely unlikely, 
and proposed that it was more likely that gene conversion had spread a single mutation across multiple 
haplotypes. 

The theory we have developed can be used to assess the plausibility of the multiple mutational origins of 
the sickle cell allele, by exhibiting parameters combinations that yield characteristic lengths consistent with 
the separation of the sample locations . (Recall that the wave of advance, and thus also our model, works 
in the case of heterozygote advantage ( Aronson and WeinbergerI . Il975).) The diffe r ent H bS haplotypes 



Flint et aLl . ll998[ ). (noting that 



co-occur within a few thousand kilometers of each other (see Table 5 of 
these locations are unlikely to reflect the geographic mutational origins, and mutations will have been spread 
by large population movements). As the HbS chan ges occur at a s ingle base pair, the mutation rate would 
have been ~ 10~^, and we take an s = .05 (as in lCURRAT et aU ,20021. If human dispersal at that time 
was well approximated by a Gaussian kernel with a = 100 km, then a characteristic length of ~ 1000 km 
would require an effective density of individuals of p ~ 25 km~ , while if ct = 10km, then we would require 
only p ~ 2.5 km~ . This latter set of parameters does not seem unrealistic considering our knowledge of 
population density and dispersal parameters, so our model suggests that the hypothesis of multiple origins 
is not unreasonable. 



4 Discussion 

We have presented theoretical results on the prevalence of parallel mutations during the sweep of an adapted 
allele across the geographic range of a species. Parallel adaptation can occur in a species adapting to a shared 
selection pressure simply because selected alleles may spread slowly enough to allow other mutations to arise 
and spread elsewhere in the species range. The distribution of the number of unique types is given implicitly 
by a certain Poisson process, which we summarized by computing the mean values of several important 
quantities. Many features of the continuous model can be captured by a characteristic length, a compound 
parameter that combines the dispersal parameter, the mutation rate and the population density, and our 
simulations confirm that this is a very useful predictor of model behavior. This characteristic length can be 
obtained under a wide variety of dispersal kernels, as long as the speed at which selected alleles spread under 
those kernels can be computed even if that speed is not constant. The regions occupied by distinct types have 
dimensions on order of the characteristic length, so if the species range is at least as large as the characteristic 
length, then parallel adaptation is likely. The expected number of parallel mutations is a simple function 
that, as intuition would predict, decreases with dispersal rate and increases with mutation rate. Somewhat 
counterintuitively, the results are relatively insensitive to the strength of selection, as selection both hastens 
the spread of an allele and conversely increases the chances that a new mutation escapes drift. 

Does parallel adaptation require strong population structure? Our results confirm the intuitive 
idea that species with low levels of dispersal (and hence strong geographic genetic structure) may adapt to 
global selective challenges by parallel adaptation at separated geographic locations. However, the likelihood 
of parallel adaptation depends on dispersal strength relative to population density (p/cr), and so geographic 
parallel adaptation may be an important factor even in species which appear panmictic at neutral mark- 
ers. The observation of relatively little neutral geographic genetic structure merely implies that genetic 
drift in sub-populations is slow compared to migration, which increases with both dispersal distance and 
local effective population de nsity. (Neutral population str ucture is determined by Wright's neighborhood 



size, proportional to cr'^p; 



see 



Charlesworth et al\ ( 20031 ).) Increasing dispersal distance (a) reduces the 



chance of parallel adaptation, while increasing population density (p) increases the chance of parallel adap- 
tation. Therefore, an absence of strong geographic structure cannot rule out the possibility of geographic 
parallel adaptation. In this context it may be helpful to consider the global effective population size in a 
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geographically spread population, a common estimator of which is the average coalescent time of pairs of 
sequences sampled across the species range. Such e stimator s increase both with lower dispersal distances and 
higher local effective population densities (JCharl eswor th et all 120031 ). Thus, species with larger global 



effective population size are more likely to ad apt by parallel mutation whether they are truly panmictic (as 
shown bv lPENNlNGS and HermissonI ( 2006a )) or selection is dispersal limited, as discussed here. 



Signals in patterns of diversity While we have discussed our model results in terms of parallel adaptive 
changes at the same genetic locus, our results do not rely on an assumption of complete linkage between 
the selected alleles. The base pairs at which these changes occur can be contiguous, partially linked or 
completely unlinked; we merely require that the mutations are selectively equivalent. Thus our results 
apply equally well to selected alleles of similar phenotypic effect that have arisen in parallel at different 
genetic loci, e.g. mutations at different genes in the same pathway. There are a number of potential cases 
of parallel adaptation within a spec ies where adaptive changes at different genes have produced similar 
phcnotypcs in different populations ()Hoekstra et all 120061 ISteiner et all 120091: INachman et all 12003 ; 
IHoekstra and Nachman . 20031) . For example, there is evidence for differences in the genetic basis of the 



adap t ive response to shared selection pressures in E uropean and East Asian human populations ()Lao et 



20071: INorton et'ai\ . l2007t IEdwards et all l2010l) 



If parallel adaptation has occurred, there are at least two potential signatures in patterns of genetic 
variation. The first is that, immediately after the sweep, the independent copies of the selected allele will 
form a spatial patchwork with patches of size ~ x- Within each patch, a different selected allele will 
predominate. Each of these mutations may have arisen on a different haplotype, especially if neutral genetic 
variation varies across the species range. This linked genetic variation will be swept up to high frequency 
locally and so also form a patchwork, with different haplotypes common within each patch. These patches 
may maintain sharp boundaries in allele frequencies between them for some time, and so may resemble local 



adaptation, despite the fact that the causal selection pressures are homogeneous (see ICOOP et all l2009l 
for discussion). This similarity may be further compounded since the boundaries between selected types 
will tend to occur at geograp hic barriers to migration, as selected alleles will temporarily be slowed there 
(jPiALEK and BartonI . 119971 ). 

Over time these spatial patterns will be erased through the mixing action of migration. This spatial mixing 
by migrants will occur in a manner analogous to heat flow, on a time scale given by the diffusive parameter 
a, the standard deviation of the distance between the birth locations of parent and child. In a geographic 
region of linear dimensions R, the patterns will become erased in a time of order B? j a^ generations. If the 
alleles have strongly deleterious epistatic (or dominance) interactions then their g eographic mixing will be 
mor e complicated, and pe rhaps slowed. A potential example of this is offered bv IWlLLiAMS et al\ ( 20051 ) 
and lPENMAN et al\ ( 2009i ) who discuss the role of cpistasis in preventing the geographic mixing of different 
malaria resistance alleles in humans. Indeed, IKondrashov (2003) has suggested that the relatively slow 
spread of selected alleles across a species range might allow Dobzhansky-MuUer epistatic incompatibilities 
to arise in parapatry, potentially leading to speciation. 

The second, possibly longer-lasting, pattern is the partition of alleles and their linked haplotypic variation. 
If genetic drift is slower than spatial mixing, then the initial partition of alleles is spread out uniformly over 
space long before any alleles disappear or fix through genetic drift. After mixing by migration, the resulting 
patterns would be characterized by a region of reduced variation and longer LD surrounding each selected 
locus. If the independent selected ch anges occur at the same gene , this w ill resemble a soft sweep in a single 
panmictic population as described bv lPENNlNGS a nd Hermissok (^2006a b^ Indeed, it is possible that some 
of the characterized putative soft sweeps (e.g. [Schlenke and Begun, 20051) arose in this manner. If the 
changes occur at different genes, there will be a set of partial sweeps at the different loci. Such patterns 
could potent ially explain the appar ent excess of partial sweeps, compared to full sweeps, seen in human 
populations (JCOOP et nZ.1 . 12009: Pr itchard et aLl . l20I0l ). Thus parallel mutation would allow a population 
to maintain a higher level of heterozygosity at the selected loci than would sweeps from a single mutational 
origin. A related argument has been made bv lGOLDSTEiN and HolsingerI (1992), who discussed the role 
of genetically redundant mutations and isolation by distance in the maintenance of genetic variation in 
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quantitative genetics models (see also lLANDa ( 1991 ) and lKELLYl (20061)). 



Exclusion and spatial structure Two of our main assumptions — mutational exclusion and our reso- 
lution of the "delicate situation" in Section 12.31 — lead to an underestimate of the number of independent, 
parallel adaptations, while others — such as deterministic spread of the waves — only affect the variance of 
the number of parallel adaptations. It is also useful to compare with existing results on panmictic pop- 
ulations. As described in Section 13. 1[ simulations using long-distance dispersal that resulted in a nearly 
panmictic model produced a much high er number of parallel adapta tions than predicted from our theory. 



This is in line with the results of Pennings and HermissonI ( 2006a[ ). who have shown that within a single 



large randomly-mating population, a high rate of introduction of selectively equivalent mutations can allow 
multiple mutations to escape low frequency before the first to arise fixes in the population. Extrapolating 
from their results we see that if population density is high enough in an area where a spreading mutation 
has begun to establish, then other mutations could arise and concurrently spread, undermining our assump- 
tion of allelic exclusion . The h igher migration is relative to reproductive rate, the closer a model is to 



the panmictic model of IPennings and Hermisson (2006a). In general, since it neglects spatial structure, 



we expect the panmictic model to underestimate the true degree of parallel adaptation as well, so if the 
panmictic model predicts more parallel adaptations than does our model, we expect the truth to be closer 
to the panmictic predictions. Future work could relax our assumption that the mutations quickly reach 
equilibrium, allowing model predictions more accurate than either model. In any case, our results provide 
an underestim ate of the prevalence of parallel adaptation, but with very widely dispersing species the results 
of (|Pennings and Hermisson, . !2006ai) may be more appropriate. 

Selective equivalence Throughout this paper we have assumed that variation in selection coefficient will 
be much smaller than the strength of selection (as follows from our focus on parallel adaptation). However, 
it is unclear how often the strict selective equivalence holds in practice. Mutations that have a convergent 
effect on a phenotype of interest may differ in their pleiotropic effects, and even identical changes at the 
same base pair may have somewhat different effects due to linked variation. 

However, the characteristic length only depends weakly on s, so the effect of small differences in selection 
coefficient should be minimal. Further, our results on stochastic waves (Section 12. 4p suggest that if there 
are only small differences, it is reasonable to use the mean selection coefficient, and that the mutations will 
initially form a patchwork with the average size of a patch given by this characteristic length. The time 
scale over which the patchwork persists will be affected by the differences in selection coefficient. Suppose 
for simplicity that each newly arising type chooses one of only two distinct selection coefficients, either si 
(a "weak" mutation) or S2 > si (a "strong" mutation) that interact additively. The original patchwork is 
erased as the stronger mutations push into, or arise within and overtake, areas already occupied by the weak 
mutations. They do this at speed cr-y/2(s2 — si), so the time-scale over which the original tessellation is 
erased is of order x/('''\/2(s2 ~ si))- The patterns in diversity resulting from multiple types with different 
selection coefficients will depend on the linkage of the loci underlying the different types. In some cases (e.g. 
full linkage), the stronger allele may push the other out of the population as it spreads; while in other cases 
(e.g. no linkage) the stronger allele will spread throughout the population but not disrupt the spatial pattern 
of the weaker alleles. 

Outlook Our results demonstrate that if dispersal is indeed a limiting factor in the spread of selected 
alleles, then in large geographically spread populations, parallel adaptation will be common. As yet there 
are relatively few firm examples of parallel adaptive mutations within species, but we believe that this simply 
refiects the fact that we are just beginning to identify and understand mutations that contribute to adaptive 
phenotypes. It is notable that many of the cases of geographic parallel adaptation come from humans and 
the other species where phenotypes that have reasonably well-characterized genetic bases have been carefully 
studied (e.g. pigmentation, or drug and insecticide resistance). This suggests that further work on other 
species and phenotypes will uncover many more examples. 
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Genes or pathways that harbor different mutations that have swept in non-overiapping parts of the 
species range will represent good candidates for geographic parallel mutation. One difficulty in interpreting 
these candidates will come in understanding whether they are approximately genetically redundant with 
respect to the phenotypes that selection has acted on, or if they dominate in different portions of the species 
range because they represent locally adapted alleles. The former explanation may be appropriate as a null 
hypothesis, as it requires fewer differences in selection pressures across the range, and requires only that 
populations are geographically separated. 

Aside from identifying candidates for geographic parallel mutation, a productive line of research is to 
understand whether population densities and dispersal patterns are conducive to their occurrence. The 
spatial density of individuals within a population is likely to fluctuate dramatically over time, so the long 
term effective population size for the species is likely to be a very poor estimate for the rate at which selected 
mutations arise, especially in populations that have experienced recent rapid growth. The move towards 
genome-wide population resequencing data will allow the recent effective population size to be estimated 
from the rare al leles, and the spatial spread of t hese rare alleles will be informative about recent dispersal 
parameters fe.g. INovembre and SlatkinI|2009|) . 



As yet we know relatively little about the full impact of long-distance dispersal; a situation that will 
hopefully be improved by the increa sing spatial and genomic resolution of population genetic studies (e.g. 
NOVEMBRE et all I2OO8I : IAuton eta l. 2009 ), along with the methods to accurately identify subtle signals 



of gene flow in such data sets (e.g. IPrice et all 120091 ). In many species rare, extremely long distance 



migrants occur, which can have str ong effect s on the speed and patchiness with which the wave advances 
(JLewis and PacalU I2OOOI: IClarkI . ,1998; K ot et all Il996l ). While in numerical examples we did not see a 



strong effect on the likely amount of parallel mutation, this conclusion does not extend to all parameter values, 
and it is difficult to compare parameters across different dispersal distributions. If migration is not spatially 
restricted (e.g. the fully connected "island" model), then we expect the dynamics to be sigi iificantly different 



There are a number of examples of very rapid spread o f selected alleles (e.g. in malaria IWootton et 



2002; 



19831: 



Roper erall . '2004U Anderson and RoPERl . l2005h . and of vertically transmitted parasites (JKlDWELl . 



TURELLI and Hoffm annI . I1991 ). perhaps suggesting that the extent to which limited dispersal allows 



parallel adaptation is still a very open question, and likely to vary between species. 
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A Stochastic waves 

Here we continue on where Section [2.41 left off, to demonstrate when, and in what sense, expressions for the 
"mean wave" suffice, even if the true behavior is stochastic. 

We need to know the probability that an isolated mutation arising at x will first cover a point at y after 
time i, which we define to be q{t, r), with r = \\x — y\\ , the distance from x to y. We assume that this quantity 
only depends on the distance between x and y, regardless of the direction. This does not require that waves 
are circular, only that there are no "preferred directions" . We also assume that each wave's stochasticity is 
independent of the Poisson process of locations and each other. 

The mean time until a point Xq is reached by the adaptive type can be found in a manner entirely 
analogous to the deterministic case. Before, all waves were the same, so we only needed to keep track 
of their origins; now mutations can be thought of as having different "types" , chosen independently and 
randomly from a type space V, so that each point now consists of a triple (Xi, Ti,Vi), recording the location 
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Xi and time Ti of the mutational origin, and what type of wave Yi it will have, respectively. The types Vi are 
chosen independently from some common distribution. For a simple example, if only the speed is random, 
then V is the set of positive real numbers, Vi simply records the speed of the wave, and q{t, r) = P{Vi > r/i}. 
Just as before, the number of points in any region of time, space, and type space is Poisson distributed, 
with mean proportional to its measure. We now measure "area" in the third coordinate using the distribution 
of V ^ so that the number of mutations that occur in a spatial region of area A^ over a time interval of length 
t, and with type lying in some set U is Poisson distributed with mean AAtPj]/ e U}. The probability that 
the mutant type has not reached x by time t is the exponential of the total measure of possible mutants that 
would have reached {x,t). This measure can be written as 

h{t)= / duj{d)r''-^q{u,r)drdu, (13) 



so that h(t) is also the mean volume in space-time swept out by an expanding, unobstructed mutation over 
t time miits. Just as before, 

¥{T>t} = exp{-Xh{t)). (14) 

Furthermore, following the same reasoning as for equation ([9]), the mean density of types is given by 

XF{t > t}dt = XE[t]. (15) 







Since h(t) is the mean volume swept out over t time units, the equations for r and ly are the same as in 
the deterministic case, except the path of the wave has been replaced by the path of the mean wave, in this 
sense. In other words, in computing the mean density of types, we may replace the path of the wave by its 
mean path. 

B Wave speeds for some fat-tailed kernels 



Here we apply the theory of Section 12.31 to the stable and the stretched exponential families of dispersal 
kernels. We do not dwell on the justification for expressions of wave expansion profiles of different dispersal 
kernels, preferring instead to take this as input into the theory, but for the interest ed (or suspicious) reader. 



here we outline how the expressions are arrived at, which follows IKot et all ( 19961 ). with the minor addition 
of Equation (|20p . In each case, an expression for the wave speed is arrived at as follows. Suppose that 
there is a population of size N at every point, and that mutant organisms have first a dispersal stage, 
where they disperse a distance according to a distribution /c(-), then undergo density-dependent growth, 
with a population of size n{x) at location x growing t o F(n(x)). The en suing discrete-time integrodifference 
analogue to the Fisher-KPP equation studied e.g. in lKOT et all ( 19961 ) is 



n{t +l,x) = J k{x - y)F{n{t, y))dy. 

As is common in the study of such nonlinear equations, we then linearize the equation by assuming that the 
spread of the wave only depends on the beha v ior w hen the mutant type i s rare, replacing F(n) with ni^'(O), 
and writing F'{Q) = (1 -I- s). See lKOT et al\ (|l996l ) or lMoLLiSONi (|l972[ ) for discussion of this assumption. 



Then the number of mutants n satisfies 

n{t -I- 1, x) = (1 -I- s) / k{x — y)n{t, y)dy. 

Since the Fourier transform takes convolutions to products, if we write n(i, uj) — J e"'^n(i, x)dx and k{uj) = 
J e^^'^k(x)dx for the Fourier transforms of n and k respectively, then 

n{t,uj) = {l + sYn{0,uj)k{ujY. (16) 
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One might hope to then obtain information about the spread of the wave from this expression, or even by 
exphcitly inverting it. We will assume that the speed in two dimensions is the same as in one dimension, 
which can be easily proved at least in some sense for many of these models. 

B.l Stretched exponential 

A family of dispersal kernels with moments but without moment generating functions is the family sometimes 
called "stretched exponentials" , whose probability density function is 

k{x) = 



2T{l/a) 



for 0<a<l. Ifa = l and d — 1 this is the Laplace distribution; more generally the distribution is 
sometimes also called the "error distribution" . Here a controls the decay of the tails and Cc is a positive 
constant depending on a chos en so that the IQR matc hes that of the standard G aussian. Th is distribution 
with a. — \ has been used bv lNoVEMBRE etal\ ( 2005[ ) for human dispersal, and lKOT et al\ (1996) showed 



that it gave the best fit out of a few choices to dispersal data for Drosophila pseudoobscura. We introduce 
the scaling parameter a by replaci ng k(x) with k(x /a)/a, as usual. 



Following the reasoning above. IKOT et al\ (|1996l ) argued that the stretched exponential dispersal gives a 



wave that accelerates with a power of t: 

f{t)^a{ts)^/yc^. (17) 

Because of this form, properties of the constant-speed case can be derived from the case a = 1, after 
substituting v for as/ca The characteristic length for this family of distributions has the following form: 



where C = u>{d)cj" is a constant. Note that this is independent of s, unlike the constant-speed case. 

In the remainder of this section we neglect the demographic parameters r and ^^, assuming for instance 
that the offspring distribution is Poisson with mean and variance equal to 1. It is fairly straightforward to 
factor them in, but for simplicity we omit them. The variable r may appear, used as a radius; it should not 
cause confusion. 

Using the fact that 

I fe-''* dt = r — ^ 

Jq c \ c 

where F is the gamma function, we can compute that 

MO = ^^^^f^(/t)(^-)/", 

d + a 






d 
and apply the results of Section 12.31 Therefore, the mean area occupied by a typical mutation is 

E[A] =( I exp{~h{t))dt\ 

/ ^ -d/{d+a) 
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Recall also that the mean time until a sample point is occupied satisfies 

EH ^ 



AE[^] ' 



which gives an idea of the time scale the process happens on. The factor of A appears because of our time 
scaling. 

Also, the moments of the distance X of a sample point to the origin of its mutation have a nice form: 



d + n Jo 



sa^\ ''i+^' dmjd) ^ r d + n+l \ 

\a J {d + n){d + a) \ d + a ) ^ ' 

d+n+i fd\ ^^^ dLo{d) ^ (d + n+l 



{d + n){d + a) \ d + a 



B.2 Stable distributions 



For another example of fat-tailed dispersal distributions, we take the well-known family of symmetric stable 
distributions, which are parameterized by a scaling exponent < a < 2. Stable distributions arise naturally 
in the generalization of the central limit theorem and as scaling limits for random walks with step distributions 
whose tails have power-law decay like x°' (so-called "Levy walks"), and so are a natural c hoice for a di spersal 
dist ribution. For recent dis cussion of modeling real dispersal with such distributions, see IReynoldsI ([2008) 



and lEPWARDS et al\ ( 20071 ). The best-known example is the Cauchy distribution {a = 1). The case a — 2 



corresponds in some senses to the Gaussian distribution, but what follows does not apply in that case because 
Equation ([20]) for the tail behavior does not hold. 

Denote by ka{x) the density function of the a-stable distribution, again normalized to have the same 
IQR as the standard Gaussian. To incorporate a scaling parameter analogous to the standard deviation, we 
use the dispersal kernel ka{x / a) / a . 

In the case of a stable dispersal distribution, the Fourier transform ([T6| can be explicitly inverted. If we 

begin with tiq mutants at the origin (as a delta distribution) , then the spread is itself given by the dispersal 

kernel, 

I. ^ (1 + sf ,^ ( X -^ 
n{t,x] — no —, — ka — n— i 

so if Xj > is such that nt{xt) — eng, then 



" W + sY 

The density of the stable distribution for general a is not known in general, but its tail behaviour is. For 
large x, 

n\x\'-+°' 



and so for large t and small e. 



. l/(l+a) 

xt~ (J { — i(l + s)*asin(7ra/2)r(a) 



giving /(t) = (T (f(l + s)*Q;sin(7ra/2)r(Q;)/e) ". Note that the asymptotic speed is no longer independent 

of e - positions farther out the wave front move faster. For our numerical purposes, we will take e = 0.2. 

An analytic expression for the characteristic length can be written using the Lambert W-function, but 
it is more efficient to solve ([5]) numerically. The integrals giving other properties of the process are difficult 
to do explicitly, but can be done numerically as well. 
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